Search CORE

OPUS - University of Technology Sydney

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

On pairwise distances and median score of three genomes under DCJ

Author: A Bergeron
A Caprara
A Goeffon
AW Xu
AW Xu
AW Xu
E Tannier
MA Alekseyev
MA Alekseyev
MA Alekseyev
MA Alekseyev
Max A Alekseyev
R Lenne
S Yancopoulos
Sergey Aganezov
V Rajan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/10/2012
Field of study

In comparative genomics, the rearrangement distance between two genomes (equal the minimal number of genome rearrangements required to transform them into a single genome) is often used for measuring their evolutionary remoteness. Generalization of this measure to three genomes is known as the median score (while a resulting genome is called median genome). In contrast to the rearrangement distance between two genomes which can be computed in linear time, computing the median score for three genomes is NP-hard. This inspires a quest for simpler and faster approximations for the median score, the most natural of which appears to be the halved sum of pairwise distances which in fact represents a lower bound for the median score. In this work, we study relationship and interplay of pairwise distances between three genomes and their median score under the model of Double-Cut-and-Join (DCJ) rearrangements. Most remarkably we show that while a rearrangement may change the sum of pairwise distances by at most 2 (and thus change the lower bound by at most 1), even the most "powerful" rearrangements in this respect that increase the lower bound by 1 (by moving one genome farther away from each of the other two genomes), which we call strong, do not necessarily affect the median score. This observation implies that the two measures are not as well-correlated as one's intuition may suggest. We further prove that the median score attains the lower bound exactly on the triples of genomes that can be obtained from a single genome with strong rearrangements. While the sum of pairwise distances with the factor 2/3 represents an upper bound for the median score, its tightness remains unclear. Nonetheless, we show that the difference of the median score and its lower bound is not bounded by a constant.Comment: Proceedings of the 10-th Annual RECOMB Satellite Workshop on Comparative Genomics (RECOMB-CG), 2012. (to appear

arXiv.org e-Print Archive

Cassis: detection of genomic rearrangement breakpoints

Author: C. Baudet
C. Gautier
C. Lemaitre
Darling
E. Tannier
Lemaitre
Lemaitre
M.-F. Sagot
Z. Dias
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Summary: Genomes undergo large structural changes that alter their organization. The chromosomal regions affected by these rearrangements are called breakpoints, while those which have not been rearranged are called synteny blocks. Lemaitre et al. presented a new method to precisely delimit rearrangement breakpoints in a genome by comparison with the genome of a related species. Receiving as input a list of one2one orthologous genes found in the genomes of two species, the method builds a set of reliable and non-overlapping synteny blocks and refines the regions that are not contained into them. Through the alignment of each breakpoint sequence against its specific orthologous sequences in the other species, we can look for weak similarities inside the breakpoint, thus extending the synteny blocks and narrowing the breakpoints. The identification of the narrowed breakpoints relies on a segmentation algorithm and is statistically assessed. Here, we present the package Cassis that implements this method of precise detection of genomic rearrangement breakpoints

HAL-CentraleSupelec

CiteSeerX

INRIA a CCSD electronic archive server

A Unifying Model of Genome Evolution Under Parsimony

Author: A Bergeron
A Caprara
AE Darling
AW Xu
B Paten
B Paten
B Paten
B Raphael
Benedict Paten
C Chauve
D Bienstock
Daniel R Zerbino
David Haussler
E Tannier
G Bourque
Glenn Hickey
I Elias
J Edmonds
J Felsenstein
J Kim
J Ma
L Chindelevitch
LL Wang
M Alekseyev
M Bader
M Blanchette
M Shao
MD Braga
N El-Mabrouk
N El-Mabrouk
O Westesson
P Medvedev
S Hannenhalli
S Yancopoulos
S Yancopoulos
W Day
W Miller
YS Song
Publication venue
Publication date: 12/05/2014
Field of study

We present a data structure called a history graph that offers a practical basis for the analysis of genome evolution. It conceptually simplifies the study of parsimonious evolutionary histories by representing both substitutions and double cut and join (DCJ) rearrangements in the presence of duplications. The problem of constructing parsimonious history graphs thus subsumes related maximum parsimony problems in the fields of phylogenetic reconstruction and genome rearrangement. We show that tractable functions can be used to define upper and lower bounds on the minimum number of substitutions and DCJ rearrangements needed to explain any history graph. These bounds become tight for a special type of unambiguous history graph called an ancestral variation graph (AVG), which constrains in its combinatorial structure the number of operations required. We finally demonstrate that for a given history graph

G

, a finite set of AVGs describe all parsimonious interpretations of

G

, and this set can be explored with a few sampling moves.Comment: 52 pages, 24 figure

arXiv.org e-Print Archive

eScholarship - University of California

On the PATHGROUPS approach to rapid small phylogeny

Author: A Caprara
AC Siepel
AW Xu
C Zheng
Chunfang Zheng
D Sankoff
D Sankoff
David Sankoff
E Tannier
G Fertin
KP Byrne
N El-Mabrouk
R Warren
S Yancopoulos
SM Hedtke
Z Adam
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

We present a data structure enabling rapid heuristic solution to the ancestral genome reconstruction problem for given phylogenies under genomic rearrangement metrics. The efficiency of the greedy algorithm is due to fast updating of the structure during run time and a simple priority scheme for choosing the next step. Since accuracy deteriorates for sets of highly divergent genomes, we investigate strategies for improving accuracy and expanding the range of data sets where accurate reconstructions can be expected. This includes a more refined priority system, and a two-step look-ahead, as well as iterative local improvements based on a the median version of the problem, incorporating simulated annealing. We apply this to a set of yeast genomes to corroborate a recent gene sequence-based phylogeny

Directory of Open Access Journals

Reconstructing the History of Yeast Genomes

Author: A Bhutkar
AU Sinha
B Dutrillaux
B Llorente
C Soighe
C Zheng
D Sankoff
David Sankoff
E Tannier
FS Dietrich
Jianzhi Zhang
JL Gordon
KH Wolfe
KP Byrne
M Kellis
N Martin
P Pevzner
WJ Murphy
Publication venue: Public Library of Science
Publication date: 01/05/2009
Field of study

Directory of Open Access Journals

Sampling and counting genome rearrangement scenarios

Author: A Bergeron
A Bergeron
A Caprara
A Darling
A Karzanov
A Ouangraoua
A Rajaraman
AC Siepel
B Larget
C Chauve
C Zheng
D Sankoff
DVM Braga
E Tannier
E Tannier
G Brightwell
Heather Smith
I Miklós
I Miklós
I Miklós
I Miklós
I Miklós
I Miklós
I Miklós
István Miklós
JS Liu
KM Swenson
L Lovász
LG Valiant
MA Alekseyev
MA Alekseyev
MR Jerrum
MR Jerrum
N Metropolis
P Feijão
PL Erdős
R Durrett
R Warren
S Geman
S Hannenhalli
W Hastings
WM Fitch
Y Ajana
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Even for moderate size inputs, there are a tremendous number of optimal rearrangement scenarios, regardless what the model is and which specific question is to be answered. Therefore giving one optimal solution might be misleading and cannot be used for statistical inferring. Statistically well funded methods are necessary to sample uniformly from the solution space and then a small number of samples are sufficient for statistical inferring

SZTAKI Publication Repository

Multichromosomal median and halving problems under different genomic distances

Author: A Bergeron
A Bergeron
A Bergeron
A Caprara
C Zheng
C Zheng
C Zheng
C Zheng
Chunfang Zheng
D Bryant
D Sankoff
David Sankoff
E Ohlebusch
E Tannier
Eric Tannier
G Bourque
G Fertin
G Jean
G Tesler
G Watterson
I Pe'er
J Aury
J Mixtacki
L Lovasz
M Alekseyev
M Bernt
M Ozery-Flato
MR Garey
N El-Mabrouk
P Berman
P Pevzner
R Lenne
R Warren
S Hannenhalli
S Hannenhalli
S Otto
S Yancopoulos
W Xu
X Chen
Y Lin
YC Lin
Z Adam
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Genome median and genome halving are combinatorial optimization problems that aim at reconstructing ancestral genomes as well as the evolutionary events leading from the ancestor to extant species. Exploring complexity issues is a first step towards devising efficient algorithms. The complexity of the median problem for unichromosomal genomes (permutations) has been settled for both the breakpoint distance and the reversal distance. Although the multichromosomal case has often been assumed to be a simple generalization of the unichromosomal case, it is also a relaxation so that complexity in this context does not follow from existing results, and is open for all distances. Results We settle here the complexity of several genome median and halving problems, including a surprising polynomial result for the breakpoint median and guided halving problems in genomes with circular and linear chromosomes, showing that the multichromosomal problem is actually easier than the unichromosomal problem. Still other variants of these problems are NP-complete, including the DCJ double distance problem, previously mentioned as an open question. We list the remaining open problems. Conclusion This theoretical study clears up a wide swathe of the algorithmical study of genome rearrangements with multiple multichromosomal genomes.</p

Directory of Open Access Journals

INRIA a CCSD electronic archive server